home *** CD-ROM | disk | FTP | other *** search
-
- *****************************************************************
- * *
- * ASMGEN.COM - by J. Gersbach and J. Damke (Ver. 2.01) *
- * *
- * A program to generate cross-referenced assembly language code *
- * from any executable file. *
- * *
- *****************************************************************
-
-
-
- * PREFACE *
-
-
- This program will generate 8086/87/88 assembly code text
- that is compatible with the IBM Personal Computer Macro
- Assembler from any executable diskette file up to 65,535
- bytes. The output can be routed to the console or a disk-
- ette file. A reference list may be generated separately or
- embedded at the appropiate instruction counter address in
- the assembly code.
-
- Some manual touch up will be required before reassembly, but
- nearly all the typing is done for you by ASMGEN and anything
- questionable is marked with "??".
-
- A file of sequential instructions may be resident on the
- same diskette to indicat to ASMGEN which addresses contain
- code, byted, words, or strings. This file may also include
- instructions to assume segment register values or toggle the
- output of assembley code text, generation of the reference
- table, 8087 mnemonics, of the inclusion of embedded reference
- information in the assembly file.
-
- DEBUG may be used to browse through the executable file to
- determine the starting locations of code and data to develop
- the sequential instruction file. It is important to accu-
- rately specify these locations for an accurate reference
- tabel and minimum touching up of the ASM output text.
-
- The number of references within the file determines the amount
- of memory required since a reference tabel is built in
- memory during the first pass. Disassembly is done from disk
- and only one file sector is in memory at any given time.
- Therefore memory size does not limit the size of the file
- to be disassembled. 48K bytes of memory will be enough for
- most programs but a few will need 64K or 128K. One diskette
- drive is sufficient but two is more convenient.
-
-
- * STARTING ASMGEN *
-
- There are two ways to work with ASMGEN: either by using the
- command menu or by calling ASMGEN with parameters.
- Following are the descriptions of both options.
-
- * USING THE ASMGEN MENU *
-
- The program is invoked by typing: ASMGEN
-
- You are then prompted for a file specification. Respond with
- the name of the executable file from which you wish to
- generate the assembly code. The executable file will normally
- have an extension of .EXE or .COM. ASMGEN will check this
- file spec for validity and then respond with a prompt that
- includes a summary of the command letters indicating that
- you may give it a command. The executable file contents
- are not checked for valid code and ASMGEN will try to dis-
- assemble text or compressed BASIC files and produce unintell-
- igible assembly code.
-
- The commands are:
-
- X filespec This file spec replaces any previous executable
- file spec. The usual file extension is .COM
- or .EXE
-
- EXAMPLE: X DATE.COM
-
-
- A <filespec> The executable file is disassembled and the
- assembly code is routed to the specified file.
- The usual file extension is .ASM. If the filespec
- is omitted, the output will default to the
- console.
-
- EXAMPLE: A DATE.ASM
-
- R <filespec> The reference table is sent to the file
- specified. The usual file extension is .TBL. If
- the filespec is omitted, the output will default
- to the console.
-
- EXAMPLE: R DATE.TBL
-
- Q The program is terminated and control returned to
- DOS.
-
-
- Each time a command has been executed, ASMGEN waits with a one
- line prompt for the next command.
-
- X <filespec>, A <CON>, R <CON> or Q ?
-
- The default filespec for each command is shown in brackets.
- Enter the next command of your choice as described above.
-
-
- * USING ASMGEN WITH PARAMETER CALLS *
-
- Up to three file specifications may be included when ASMGEN is
- first called from DOS. The executable file's name is given
- first, followed by specifications for the assembly and reference
- table files.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM, DATE.TBL
-
- If a semicolon follows the last filespec, ASMGEN will exit to DOS
- when the command has been executed. If no semicolon is entered,
- ASMGEN will display the menu options described above and wait for
- further input after executing the command.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM;
-
- If the filespec for the .ASM file and/or .TBL file is omitted,
- ASMGEN will generate first the .ASM file, then a .TBL file using
- the filename of the first filespec.
-
- EXAMPLE: ASMGEN DATE.COM,,; creates DATE.ASM and DATE.TBL and
- exits to DOS.
-
- If only the reference table is desired, the dummy name NUL should
- be entered in place of an .ASM filespec
-
- EXAMPLE: ASMGEN DATE.COM, NUL, DATE.TBL
-
- If only one filespec is given when the program is called, the
- reference table is built in memory and then the menu options are
- displayed for further commands.
-
- EXAMPLE: ASMGEN DATE.COM
-
-
- * PROGRAM EXECUTION *
-
- The disassembly is done in two passes through the scource file.
- On pass #1, the reference table is built in memory and the actual
- output is generated during pass #2. Once the reference table is
- established, it remains in memory until an X or Q command is
- issued, and subsequent A and R command executions skip pass #1.
- This saves a lot of time when the executable file is large.
-
- Three contiguous data areas are built dynamically in memory
- during pass #1. First is the compressed sequential instruction
- list. Second is a list of pointers for .EXE files that point to
- the locations of all relocatable variables in the program, also
- arranged in numerical order. These are established before
- reading any code. Third, the reference table is then built in a
- higher area of memory as pass #1 progresses.
-
- If all available memory in the program segment is filled before
- the first two data areas are completed, ASMGEN will abort to the
- command prompt. After the reference table is started, a shortage
- of memory will produce the message "Reference Table Incomplete
- Due to Insufficient Memory" and continue.
-
- Ctrl-Break may be used at any time to interrupt a command in
- progress.
-
-
- * READING THE ASSEMBLY CODE FILE (.ASM) *
-
- This file begins with a title taken from the executable file's
- name and date followed by the current date (in brackets).
-
- If not inhibited by the M switch in a SEQ file (explained
- later), the macro library will appear next in the file.
-
- Next will be a .RADIX 16 pseudo-op which tells the macro
- assembler that all numbers are in hexadecimal form.
-
- Then comes a header that indicates a starting value for the code
- segment, stack segment, instruction pointer and the stack
- pointer. The stack pointer is usually set to FFFF for .COM files
- but may be somewhat less depending on available memory. These
- values are passed by the linker for .EXE files.
-
- The first ASSUME statement might come next. There is one
- generated for each segment that begins with code. All segment
- registers are designated according to the current set of ASSUMEs.
- They will sometimes be incorrect, so all ASSUME statements should
- be checked prior to re-assembly.
-
- The disassembled output follows, terminated by an END statement
- and the execution address. An ORG psuedo-op is included if
- required.
-
- The text is compatible with the IBM Macro Assembler and the
- format is the same except for RETurns. To avoid the need for
- PROCedure titles, special mnemonics are provided for all RET
- instructions. These are defined in the macro library at the
- beginning of the file. Only macros that are needed for the
- current file are produced. The optional embedded commands that
- make up the reference table enhance the readability of the file.
- For very large files, this is sometimes undesirable and a
- separate reference table is best.
-
- When invalid instructions are encountered in code areas, they are
- reproduced as byte values followed by "??". If a near jump is
- defined previously in the code, and it is within range of a short
- jump, a NOP instruction is inserted after the jump. The
- executable file created with this .ASM file and the Macro
- Assembler and Linker will then be the same length as the original
- file. This makes it less important to differentiate between
- labels and numeric constants since the label values and their
- offsets within the file will be the same. The fundamental
- problem of disassembly is in knowing if the original assembly
- code defined a number as a label which changes as a function of
- it's position or as a number that always remains the same. If
- you make changes in the assembly code however, you must properly
- specify all values. You might as well remove all NOPs at the same
- time.
-
- Labels are five characters long and begin with "L". Segment
- labels begin with "S". The remaining characters are the current
- instruction counter in hex form, thus making each label unique
- and showing it's location in the original file. The instruction
- counter is continuous throughout the assembly code without
- resetting at segment boundaries. The segment labels are then in
- byte as opposed to paragraph form. In those cases where a label
- value is modified by an ASSUME statement, the original value is
- included as a comment in the referencing instruction so that it
- may be easily changed back if it was not intended as a location.
-
- The word "Relocatable" is printed at the end of any line that
- contains an ablolute paragraph value. These are values that DOS
- modifies after loading but before executing a program. They are
- used for loading segment registers that are sensitive to the
- program location in menory. Relocatable values are not modified
- by ASSUMEs. ASMGEN converts these numbers from paragraph to byte
- values by multiplying them by sixteen so that they will fit
- within the 16-bit instruction counter field. When the paragraph
- value is negative or exceeds 0FFFH, it is left unchanged and a
- warning (??) is issued on that line. When a program larger than
- 64K bytes is being disassembled, it should be divided into
- smaller files.
-
- All words are produced as labels, except when the "L" switch has
- been enacted in the .SEQ file (explained later). The label name
- indicates it's numeric value and, if it does not occur on an
- instruction boundary, the name indicates it's position relative
- to the current instruction pointer is given by an EQU statement.
- Therefore the Macro Assember will assume that it is a location,
- but it is easily changed to a constant since the value is given
- in the label name. The word OFFSET precedes a label whenever it
- is questionable whether it is a label or an immediate value. You
- must decide which of the labels should be constants and which of
- the constants should be labels, and change them accordingly.
- When changing labels to numbers, be sure to append an "H" if the
- number ends with a "D" or a "B" since the Macro Assembler will
- otherwise assume that it is decimal or binary.
-
- Bytes are always treated as constants. An optional switch may be
- included in the .SEQ file (explained later) which enables numbers
- instead of labels if all references to the value are data segment
- and immediate operation types.
-
- An effective procedure to follow in attempting to understand the
- assembly code file is to look first for the message text area,
- the input commands, and the simpler subroutines. Then add label
- names to addresses in the .SEQ file (explained later) that
- remind you of their purpose. Add comments to the labels. If
- these names are well chosen, the larger routines eventually will
- become clear. The embedded references are produced as labels so
- they will retain their meanings as they are changed.
-
- It is also helpful to spend some time studying the structure of
- data areas. Vector tables, which are frequently used to control
- the program's flow, reveal the program's structure very quickly.
- If some routines do not have labels at the beginning, it is
- usually because the code or tables that reference them (or the
- segment register assumptions) are not properly defined in the
- .SEQ file.
-
-
- * READING THE REFERENCE TABLE (.TBL) *
-
- A referencee is defined as a number that is referenced somewhere
- in the program. It may be a program loaction or a numeric
- constant.
-
- A referencor is is defined as the address in the program from
- which a reference is made to the referencee.
-
- Each entry is composed of a referencEE followed by a list of
- referencors. If more than one line is needed, additional lines
- are indented to the first referencor position. The referencEE is
- followed by an "S" if it includes references to the beginning of
- segment. The referencor is followed by two letters, the first of
- which represents the segment register that is implied or prefixed
- in the referencing instruction. The second letter indicates the
- type of operation on the referencEE. When the reference entries
- are embedded in the assembly code, all values are preceded with
- the letter "L".
-
- -----------------------------------------------------------------
- 1st letter | 2nd letter
- SEG REGISTER | TYPE OF OPERATION
- -----------------------------------------------------------------
- C code | J jump M modify - INC, ADD, etc.
- S stack | C call I immediate - value or offset
- D data | R read T test or compare
- E extra | W write ? unknown or ESC instruction
- | P port
- ----------------|------------------------------------------------
-
-
-
- * WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
-
- The sequential instruction file is a list of special instructions
- to ASMGEN which the user creates. The file takes the form of a
- list of hexadecimal addresses and single-letter instructions or
- generation switches. If used, the .SEQ file must be on the same
- diskette as the source file and have the same name as the source
- file with an extension of .SEQ. Each instruction in the file
- must be in one of the following formats:
-
- addr command
- or
- addr command ;comment
- or
- addr command label comment
- or
- addr command label comment ;comment
-
- "addr" represents the instruction pointer value. All addr values
- must be in numerical sequence in the file.
-
- "command" may be either a toggle switch or a generation
- instruction.
-
- "label" is optional and replaces the label generated for this
- address with this non-blank string.
-
- "comment" is optional and must be preceded by "label" unless the
- dummy label "." is used. Everything following "label" is treated
- as an address comment and will be printed in the ASM file behind
- the generated instruction. The address comment may be up to 255
- characters in length and should not contain a semi-colon.
-
- ";comment" is optional. Anything following a semi-colon in the
- .SEQ file instructions is considered as a comment in the .SEQ
- file only and is not added to the generated .ASM file.
-
- "label" and "comment" are not allowed when a generation switch is
- coded, but a ";comment" may be used to help clarify the .SEQ
- file.
-
- The .SEQ file is read into memory before the first pass starts.
- The addresses and commands will be compressed, but "label" and
- "comment" will be held in memory one to one. An effect of this
- is that memory space required for dis-assembly increases with
- each "label" and "comment" added to the .SEQ file.
-
-
- * DESCRIPTION OF GENERATION SWITCHES *
-
- THE VARIOUS TOGGLE SWITCHES ARE SET TO ON BY DEFAULT. Switches
- may be toggled on and off at any point in the .SEQ
- file/disassembly.
-
- All options switches except /M and /H can be either toggled or
- directly set by the user. A suffix of "+" turns the switch ON,
- and a suffix of "-" turns the switch OFF. Switches encountered
- in the file that have neither of these suffixes are toggled to
- the opposite of their state at the time; ON switches are turned
- OFF and OFF switches are turned ON.
-
- /B - generate byte references
-
- When ON, byte and word references are included in the reference
- table. When OFF, only word references are generated.
-
- /E - embedded references in ASM file
-
- When ON, reference table entries are inserted in the text just
- before the referencee's definition statement. When OFF, these
- entries are not included with the disassembled text. The entire
- reference table can be printed with the "R" command.
-
- /F - 8087 mnemonics
-
- When ON, ESC instructions are produced. When OFF, ESC
- instructions are assumed to be 8087 instructions and 8087
- mnemonics are produced.
-
- /H - append hex "H"
-
- When this switch appears at any point in the .SEQ file, an "H" is
- appended to all hex numbers. This does not, of course, apply to
- the labels which are hex values preceded by the letter "L". The
- .RADIX 16 pseudo-op is omitted which allows the assembler's radix
- to default to decimal. This switch defaults to NO H APPEND.
- Note that it will be set only once. It retains it's value until
- the next .SEQ file is read.
-
- /L - generate label or number
-
- When ON, all word references are treated as labels. When OFF, a
- word reference is treated as a constant if all referencors are
- data immediate types.
-
- /M - suppress macro library
-
- When this switch appears at any point in the .SEQ file, no macro
- library is included in the text output. The DEFAULT IS THAT THE
- MACRO LIBRARY WILL BE INCLUDED. Note that this switch will be
- set only once. It retains it's value until the next .SEQ file is
- read.
-
- /O - control ASM output
-
- When ON, ASMGEN will output the generated text. When OFF, output
- will be suppressed.
-
- /R - control TBL output
-
- When ON, ASMGEN will output the generated reference data. When
- OFF, the reference table is not printed.
-
- /T - control trace output
-
- When ON, up to 16 bytes of object code are included as comments
- in each line of the assembly code file. When OFF, object code is
- not included.
-
-
- * DESCRIPTION OF .SEQ FILE COMMANDS *
-
- A - assume
-
- The following lines contain ASSUMptions for segment register
- values. They become effective at the address specified by this
- instruction and may be modified anywhere in the disassembly. The
- required format for assumptions is:
-
- & 0400 DS
-
- The ampersand indicates a continuation of the A instruction.
-
- In this example, a data segment beginning at a instruction
- pointer value of 400 will be assumed until another A
- instruction changes it. CS, ES, and SS are also supported. The
- segment assumptions are used for effective address calculations
- only. The code segment assumption does not affect the
- instruction pointer value.
-
- B - bytes
-
- The bytes encountered in the source file are assumed to have
- meaning as single byte values.
-
- C - code
-
- The bytes encountered in the source file are assumed to be valid
- 8088 machine language instructions.
-
- D - generate data operand
-
- The operand of the instructions is changed to immediate data.
- Subsequent bytes are interpreted as "C" (code follows).
-
- I - initial value for IP
-
- The hexadecimal value on this line overrides the instruction
- pointer value at the beginning of the file - not to be confused
- with the address at which execution begins. The default values
- are 0000 for EXE files and 0100H for COM and other files. The
- execution address following the END statement is omitted if this
- option is invoked.
-
- S - strings
-
- The bytes encountered in the source file are assumed to form
- text. Quoted text is produced for valid ASCII characters and
- byte values for others.
-
- # - defined length strings
-
- The first byte encountered in the source file contains the length
- of the character string which begins with the next encountered
- character. This length value may be overridden by a subsequent
- SEQ file instruction.
-
- $ - defined length strings
-
- The first byte encountered in the source file contains the length
- of the character string which begins with the next encountered
- character plus the length byte itself. This length value may be
- overridden by a subsequent SEQ file instruction.
-
- W - words
-
- Pairs of bytes encountered in the source file are assumed to have
- meaning as word values.
-
- X - repeating data structure
-
- A cyclic data structure is assumed to begin at the specified
- instruction pointer value. The structure definition may follow
- and is prefixed by an ampersand (&) to indicate the continuation
- of this instruction. If the definition does not follow, then the
- most recent definition is used. If no structure is yet defined,
- then an error message is displayed.
-
- The following elements may be used to define the structure:
-
- & NNNN S - The next NNNN bytes are defined as string characters
- & NNNN B - The next NNNN bytes are defined as byte values
- & NNNN W - The next NNNN bytes are defined as word values
- & XXNN $ - The next sequence of bytes is defined as NN fields.
- Each field consists of a length byte and a string of
- characters. The length of each field is contained in
- the first encountered byte. The high nibble (XX), if
- non-zero, is a bit mask of the length field within
- the byte. The length field is right-justified within
- the byte after the byte value is sent to the output
- file.
-
-
-
- * EXAMPLES OF .SEQ COMMANDS *
-
- This example .SEQ file shows all the possible instructions in the
- appropriate format.
-
- ;All switches are on at the beginning.
- 0 /T ;no object code as comments in output
- 0 /M ;no macro library in output
- 0 /H ;append "H" to all numbers
- 00H /A ;assume the following segment values
- ;Note that the ampersand (&) indicates the extended ASSUME
- & 380 DS ;the data segment starts at 380 hex
- & 380 ES ;the extra segment starts at 380 hex
- 0200 I ;initialize the instruction pointer to 200
- 0200 /F ;introduce 8087 mnemonics (not ESC)
- 0200 /E ;no embedded references
- 0200 C ;code begins at 200
- 0203H W ;words are at 203
- 0207 C ;more code starting here
- 220 X ;complex data structure begins here
- & 3 W ;words
- & 1 B ;byte
- & 0E02 $ ;2 strings starting with the 2nd byte follow
- ;bits 3,2,1 of the first byte contain the length
- ;of the string including the length byte.
- ;the high nibble (0E) is the mask.
- ;see also # in summary below
- & 1 B ;byte
- ;the structure repeats until 351
- 351 B ;bytes
- 358 C ;more code
- 380 S ;strings - list of messages
- 421 W ;words
- 4FD /B ;no further byte references
- 502 /R ;garbage here - turn off reference generation
- 502 /O ;and output
- 600H /O+ ;valid code - turn output back on
- 600 /R
- 600 C
- 1A60 /O- ;output file about to fill diskette - turn output
- ;off but keep scanning for references. another
- ;run will be needed to get the remaining code.
- 1B00 /D ;treat operand as immediate data
- 1DFD /B+ ;continue with byte references
- 1F45 W user_prt ;user provided labels will
- 2256 S $MSG ;translate to upper case
-
-
- Comments may be included if preceded by a semicolon.
-
- Alphabetic characters may be either upper or lower case.
-
- An "H" may follow the hex address.
-
-
-
- * SAMPLE SESSION *
-
- The external command CHKDSK.COM will serve as an example for this
- sample session because it is short. The .SEQ file is also short
- and easy to generate. Only these few instructions are needed.
-
-
- 0100 /T ;include object code as comments in .ASM file
- 0100 /E ;simpler output without references
- 04F7H S ;messages
- 04F7H /H ;append "H" to numeric values
-
- Using DEBUG, browse through CHKDSK.COM to see how this was
- arrived at. Usually, but not always, the best procedure is to
- assume code. If the code appears unintelligible, display it in
- hex/ASCII. If it is not text, assume bytes. Label positions in
- the first disassembly may indicate that some locations should be
- words. Next, generate the .ASM file by typing
-
- ASMGEN CHKDSK.COM <enter>
- A <enter>
-
- The assembly code can be viewed on the screen. Then type
-
- A CHKDSK.ASM <enter>
-
- to save the assembly source code to a file. Then,
-
- R CHKDSK.TBL <enter>
-
- to save the cross-reference table to disk.
-
- The Macro Assembler, Link.exe and Exe2bin could now be used to
- assemble CHKDSK.ASM, link it to .EXE and convert it to a .COM
- file. No modification should be necessary in this case.
-
- If working with code that is to be modified, the symbol types
- must be correctly specified as locations or as constants. If
- they are constants, place them outside of any segment. The label
- names may then be changed to make the code more readable.
-
-
- EndOfFile
-
- ssssssssssssssssssssssssssssssssssssssss